-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: set __module__ for objects in pandas pd.DataFrame API #55171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Whenever I try to figure out how something in numpy works I have a hard time finding out where something is defined because they use patterns like |
@jbrockmendel this has nothing to do with a EDIT: opened #55178 to keep the general discussion there |
@deepak-george there are a bunch of tests where it seems we rely on this exact "pandas.core.frame" string that will have to be updated as well. First, there are some doctest failures (i.e. docstrings that need to be updated):
And then there are also normal test failures like:
and for all, see the failing CI logs. It might be possible to fix most of them with a search replace of "pandas.core.frame.DataFrame" -> "pandas.DataFrame" |
@jorisvandenbossche Thanks for the feedback! I have made the changes like pandas.core.frame.DataFrame -> pandas.DataFrame & pylint fix Now all the tests except "Unit Tests / Linux-32-bit (pull_request)" passed. I am looking into the logs and trying to fix but not sure yet what is going on. |
@jorisvandenbossche I believe BLAS is already installed, so not sure if this should be fixed in this PR. |
@deepak-george you can ignore that error, that looks unrelated (and probably if you merge with the latest main branch, that might be resolved) |
Ok. Now all the tests including the new test added to check this functionality have passed. Is there something else that need to be done before you could merge? Please let me know the next step. |
…__module__ � Conflicts: � doc/source/user_guide/enhancingperf.rst � doc/source/user_guide/io.rst � doc/source/whatsnew/v0.25.0.rst
@jorisvandenbossche What is the next step for this PR? Could this be merged? |
@aimlnerd sorry for the slow follow-up here. But we might want to only merge this after pandas 2.2 is cut, so we merge this in the main branch for 3.0. Reason for this: in theory it could be a breaking change if someone relies on the It is the plan to branch 2.2.x in one week or two, so we can revisit this shortly! |
@jorisvandenbossche Thanks for the context! Please let me know when you want to revisit, happy to make changes and complete it! |
Sorry for the slow follow-up, but in the meantime main is targetting 3.0 for a while, so we can merge this now. @aimlnerd thanks again! |
…v#55171) Co-authored-by: Joris Van den Bossche <[email protected]>
@@ -503,3 +503,24 @@ def indent(text: str | None, indents: int = 1) -> str: | |||
"future_version_msg", | |||
"Substitution", | |||
] | |||
|
|||
|
|||
def set_module(module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just came across this today, I do not see the benefit of having this decorator. We either have:
class DataFrame:
__module__ = "pandas"
or
@set_module("pandas")
class DataFrame:
...
There is a cost, particularly for reading code you have to chase down the definition. I'm also seeing 6.5% longer runtime to instantiate an empty class with this decorator, which is a small impact on import time for pandas.
These are certainly small costs, so if there is a gain to be had then great. But I'm just not seeing what the gain is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I don't think I had thought about that option for classes (setting it afterwards like DataFrame.__module__ = "pandas"
would move it further away, but is also an option), but we just copied the approach used in numpy.
If it has overhead, it's certainly another reason to do it the other way (and agree that too many decorators make the code harder too read)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that as the decorator is untyped, we also loose type definitions when applied to functions but I can add this to the todo list (along with a whats new) in the issue itself.
And also investigate the testing so that no public api classes/functions get missed.
Rather than holding up the merging of the open PRs, this discussion should probably be moved to the open issue also.
I'll go ahead and get the open ones merged so that I can update the issue with what's left to close out the issue.
The task was more about checking whether any other changes were needed and it was only the Series one that needed other changes. So if we want to not use the decorator this could be an easy followup rather than a blocker.
__module__
on top-level public objects #55178doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.This PR add
@set_module
decorator aroundclass pd.DataFrame
Before this PR
After this PR
This change would discourage users from incorrectly using
Inspired by the similar implementation in numpy:
numpy/numpy#12382